RFC: Reintroduce nnz and nonzeros #6963

ViralBShah · 2014-05-25T15:23:05Z

Reintroduce nnz and nonzeros as discussed in #6769
Deprecate nfilled
Deprecate nonzeros for StridedArray and BitArray
find()/findnz() over sparse matrices no longer filters out stored zeros

Should sparse() allow stored zeros in its input, using an optional argument? Although this discussion started with nnz, I feel it is leading towards better clarity on treatment of stored zeros, and overall I am feeling good about that.

tknopp · 2014-05-25T19:00:59Z

base/sparse/sparsematrix.jl

    return (I, J, V)
 end

+nonzeros!(S::SparseMatrixCSC) = S.nzval


Is it a convention to use ! for return types? Sorry that I ask but I just was not aware of that.

You are right - it is not.

tknopp · 2014-05-25T20:38:13Z

I am still a little surprised that nonzeros returns a copy. I can understand the reasoning behind it (the sparse object should not become corrupted). But is this a common pattern? This kind of cries for something like an read-only array.

ViralBShah · 2014-05-25T21:22:16Z

This would certainly be the expected behaviour. For access to the underlying data structures of SparseMatrixCSC, we should probably have accessor methods. We will do that early in 0.4. For now, people already directly pick up S.nzval.

mlubin · 2014-05-26T01:56:47Z

Returning a copy might be the expected behavior, but it also makes nonzeros essentially useless for the vast majority of cases that I'm aware of. The name nonzeros! seems like an abuse of notation; it doesn't make any modifications to its argument.

I don't really see any benefit that these methods bring. If you're using nonzeros!, you already know enough about the data structure to go ahead and use S.nzval. If you don't know the data structure, then the list of nonzero entries isn't too useful since you don't know what indices they correspond to, and findnz would be used instead. This plus the huge naming conflict leads me to say that we should just get rid of nonzeros altogether.

JeffBezanson · 2014-05-26T02:11:05Z

The ! should only be used if the argument is modified.

ViralBShah · 2014-05-26T04:13:28Z

@mlubin I have used nonzeros in this manner quite a bit when I used to work with MATLAB. I think the name is not confusing, and the behaviour is not a surprise to those who expect it. nonzeros! is already gone.

Just out of curiosity, @eldadHaber - Can your use case of nonzeros be replaced by findnz? The only downside of findnz is that it would allocate memory for indexes, which you may not want.

Cc: @tkelman

mlubin · 2014-05-26T05:11:29Z

doc/stdlib/sparse.rst

@@ -75,4 +75,10 @@ Sparse matrices support much of the same set of operations as dense matrices. Th

   Return the symmetric permutation of A, which is ``A[p,p]``. A should be symmetric and sparse, where only the upper triangular part of the matrix is stored. This algorithm ignores the lower triangular part of the matrix. Only the upper triangular part of the result is returned as well.

+.. function:: nonzeros!(A)


This got left over

Thanks. Removed it.

mlubin · 2014-05-26T05:14:30Z

@ViralBShah, what's a typical use case where one would want to make a copy of nzval?

tkelman · 2014-05-26T05:15:19Z

find()/findnz() over sparse matrices no longer filters out stored zeros

Not sure that's a good idea, you're just moving the inconsistent treatment of zeros from nonzeros into find. I think find should be left alone, if a function for converting CSC to COO leaving stored zeros intact is desired I think it should be a separate function (or conversion if we add a full-fledged COO type).

A construction flag in sparse to allow explicit zeros sounds reasonable.

At this point I don't see a whole lot of use for nonzeros(A) relative to A.nzval, except I guess for the sake of naming familiarity from Matlab where you don't typically have direct access to the underlying CSC data.

ViralBShah · 2014-05-26T08:59:25Z

A.nzval is not recommended. We really need to have an api for that, or else we will be stick with the internal structure.

tknopp · 2014-05-26T09:05:32Z

I am with @mlubin and @tkelman here. At this point nonzeros is just a convenience function for matlab users and funnily the Matlab function should be even more efficient as it has COW semantics.

Of course we have similar issues with all the temporaries that are created in the arithmetic vector operations but unlike + where one really has no chance to do anything but create a temporary array, here the array is already allocated.

So from the user perspective "the right thing" to do is currently to use the fields of SparseMatrixCSC. This is ok if we decide that fields are part of an interface (see #4935 and #1974). But reading through both issues I don't see that there is already a conclusion about this. If we decide that only functions are the public interface and field overloading is not coming, then we would need accessor methods like e.g. nonzeros.

Maybe we should introduce a convention that functions that accessor functions that may corrupt the state of an object should be named nonzeros💀 ;-)

But joking aside I think that we need decisions on #1974 as this will affect the way Base will involve during the journey to 1.0

To this PR, I still like @mlubin 's (#5538) naming convention better as it is more explicite and circumvents the uncertainty that we have on structural and actual non-zeros. And I still do not get the pressing reason to make Matlab compatibility so important (other than 1 user beeing unhappy). But as we have talked about that already in #6769 I think it is time to just make a decision for 0.3.

tkelman · 2014-05-26T09:44:57Z

What's the concern with field access, name stability? Wanting to apply the same code to other sparse formats with potentially different field names?

I doubt I'd use an API that's dramatically slower than field access when both are equally convenient. Having a non-copy field accessor function could be worth using, the concern about modification there is more or less the same as with slices, transposes, etc potentially returning views.

tknopp · 2014-05-26T10:08:11Z

@tkelman. Yes its about name stability and "good practice". It is always good to separate the interface from the implementation. Using functions for interfaces is whats currently done in Base (e.g. size, length) and performance should not be an issue when using inlining. But again looking at #4935 and #1974 there is not yet a clear conclusion whether this is the route to follow in the future. Field overloading would break that paradigm and make fields part of an interface. Note that my comment is about a non-copying accessor.

It is true that one could argue that the interface of SparseMatrixCSC is nothing that will be reused so that information hiding is probably not that important here. Still I think it would be good to have conventions that we follow even in cases where it is probably not so important.

lindahua · 2014-05-26T12:44:48Z

I understanding some of the reasoning behind the copy. However, knowing that nzval will be copied in nonzeros would make me circumvent this function altogether and go directly to nzval in practice.

I would suggest nonzeros returns the vector of the non zero values itself (or a view thereof), so that they can be accessed without being copied or directly manipulated. It should also be clearly documented and we should warn the user about potential consequence of modifying the elements there.

ViralBShah · 2014-05-26T13:01:13Z

One choice is to leave nonzeros as is and have a separate api that returns all the internal vectors without making copies.

stevengj · 2014-05-26T14:00:50Z

I agree with @lindahua; nonzeros should just return the data, not a copy.

ViralBShah · 2014-05-26T16:32:02Z

nonzeros no longer returns a copy.

mlubin · 2014-05-26T16:36:28Z

Cool, now on to the next controversial issue :)
I'd agree with @tkelman that find and findnz should remove the explicitly stored zeros. If a user is interacting with a sparse matrix at this higher level, I don't think the concept of structural nonzeros needs to be introduced (except maybe as a keyword argument?)

ViralBShah · 2014-05-26T16:41:24Z

I wasn't sure about that one, but wanted to try it out. I think it is best to remove it and avoid the cognitive overload.

ViralBShah · 2014-05-26T16:50:13Z

Alright, what else do you want? :-)

ViralBShah · 2014-05-26T16:57:32Z

Good point - done.

mlubin · 2014-05-26T17:01:29Z

doc/stdlib/sparse.rst


+   Return a vector of the structural nonzero values in sparse matrix ``A``. This includes zeros that are explicitly stored in the sparse matrix.


Maybe indicate here that the vector has mutating access to the original values?

mlubin · 2014-05-26T17:05:59Z

I'm satisfied on all of the substantial issues here. I think we're just missing deprecations from the 0.2 version of nnz that applied to general vectors:

@deprecate nnz(v::AbstractVector) countnz

(assuming that works correctly). Also NEWS has a mention of nfilled that should be updated.

ViralBShah · 2014-05-26T17:08:34Z

Why only AbstractVector? What about AbstractArray?

ViralBShah · 2014-05-26T17:09:17Z

I mean we should do it for StridedArray.

mlubin · 2014-05-26T17:09:35Z

That's probably right.

mlubin · 2014-05-26T17:30:42Z

NEWS.md

@@ -320,7 +320,7 @@ Deprecated or removed

  * `factorize!` is deprecated in favor of `factorize`. ([#5526])

-  * `nnz` is removed. Use `countnz` or `nfilled` instead ([#5538])
+  * `nnz` counts the number of structural nonzeros in a sparse matrix. Use `countnz` for the actual number of nonzeros. ([#6769])


There's also a link farther down for the issue [#5538]

Updated the link as well.

I think it is more accurate to replace counts with returns:

nnz returns the number of structural nonzeros ...

This method is not doing the counting.

Deprecate nfilled for SparseMatrixCSC. Deprecate nonzeros for StridedArray and BitArray. Deprecate nnz for StridedArray. Update documentation for structural nonzeros. Update NEWS

mlubin · 2014-05-26T18:48:46Z

Anyone have remaining strong objections?

tknopp · 2014-05-26T19:02:05Z

LGTM. Thanks @ViralBShah for taking all objections into account!

lindahua · 2014-05-26T20:11:52Z

@ViralBShah I think it is great. Thanks.

RFC: Reintroduce nnz and nonzeros

ViralBShah added the sparse label May 25, 2014

ViralBShah added this to the 0.3 milestone May 25, 2014

ViralBShah self-assigned this May 25, 2014

ViralBShah changed the title ~~Reintroduce nnz and nonzeros~~ RFC: Reintroduce nnz and nonzeros May 25, 2014

tknopp reviewed May 25, 2014
View reviewed changes

mlubin reviewed May 26, 2014
View reviewed changes

Reintroduce nnz and nonzeros as discussed in #6769

67f083a

Deprecate nfilled for SparseMatrixCSC. Deprecate nonzeros for StridedArray and BitArray. Deprecate nnz for StridedArray. Update documentation for structural nonzeros. Update NEWS

ViralBShah added a commit that referenced this pull request May 27, 2014

Merge pull request #6963 from JuliaLang/vs/nnz

8879b88

RFC: Reintroduce nnz and nonzeros

ViralBShah merged commit 8879b88 into master May 27, 2014

ViralBShah deleted the vs/nnz branch May 27, 2014 04:31

		@@ -75,4 +75,10 @@ Sparse matrices support much of the same set of operations as dense matrices. Th

		Return the symmetric permutation of A, which is ``A[p,p]``. A should be symmetric and sparse, where only the upper triangular part of the matrix is stored. This algorithm ignores the lower triangular part of the matrix. Only the upper triangular part of the result is returned as well.

		.. function:: nonzeros!(A)


		Return a vector of the structural nonzero values in sparse matrix ``A``. This includes zeros that are explicitly stored in the sparse matrix.

RFC: Reintroduce nnz and nonzeros #6963

RFC: Reintroduce nnz and nonzeros #6963

Conversation

ViralBShah commented May 25, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tknopp commented May 25, 2014

ViralBShah commented May 25, 2014

mlubin commented May 26, 2014

JeffBezanson commented May 26, 2014

ViralBShah commented May 26, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlubin commented May 26, 2014

tkelman commented May 26, 2014

ViralBShah commented May 26, 2014

tknopp commented May 26, 2014

tkelman commented May 26, 2014

tknopp commented May 26, 2014

lindahua commented May 26, 2014

ViralBShah commented May 26, 2014

stevengj commented May 26, 2014

ViralBShah commented May 26, 2014

mlubin commented May 26, 2014

ViralBShah commented May 26, 2014

ViralBShah commented May 26, 2014

ViralBShah commented May 26, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlubin commented May 26, 2014

ViralBShah commented May 26, 2014

ViralBShah commented May 26, 2014

mlubin commented May 26, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlubin commented May 26, 2014

tknopp commented May 26, 2014

lindahua commented May 26, 2014